Goto

Collaborating Authors

 causal forest


Calibrated Inference for the Conditional Average Treatment Effect in the Few-Placebo Regime via Gaussian Processes

arXiv.org Machine Learning

Estimating how much an intervention helps a given individual the conditional average treatment effect (CATE) is increasingly central to decision-making in medicine, economics, and policy, where an estimate is most useful when accompanied by a calibrated uncertainty interval. We study the few-placebo regime, in which one treatment arm is much smaller than the other, as arises in unequal-allocation trials and small-holdout $A/B$ tests. The standard estimator in this setting is the X-Learner, and a natural way to obtain credible intervals is to make its second stage Bayesian. We show that these intervals under-cover: they contain the true effect less often than their nominal level. We trace this to a structural cause the X-Learner's regression target inherits the bias of a nuisance model fitted to the small arm, so the posterior is centered away from the true effect and we find that the standard remedy, regressing an orthogonal doubly-robust score, is also unreliable here, since the regime's limited overlap leaves the estimator either highly variable or, once stabilized, biased once more. Both consequences reflect a pattern that extends beyond causal inference: a separately estimated variance is attached to a point estimate of a hard-to-learn quantity, and the point estimate's bias is not captured by that variance. We propose GP-CATE, which models each arm's outcome surface with a Gaussian process, so the scarce arm's uncertainty enters the posterior directly rather than as an unmodelled bias. Across synthetic and semi-synthetic benchmarks, GP-CATE attains calibrated coverage where the estimators we compare against including Causal Forest and BART do not, at the cost of intervals that are appropriately wide when the data are uninformative.




causalfe: Causal Forests with Fixed Effects in Python

arXiv.org Machine Learning

The causalfe package provides a Python implementation of Causal Forests with Fixed Effects (CFFE) for estimating heterogeneous treatment effects in panel data settings. Standard causal forest methods struggle with panel data because unit and time fixed effects induce spurious heterogeneity in treatment effect estimates. The CFFE approach addresses this by performing node-level residualization during tree construction, removing fixed effects within each candidate split rather than globally. This paper describes the methodology, documents the software interface, and demonstrates the package through simulation studies that validate the estimator's performance under various data generating processes.


Causal-Policy Forest for End-to-End Policy Learning

arXiv.org Machine Learning

This study proposes an end-to-end algorithm for policy learning in causal inference. We observe data consisting of covariates, treatment assignments, and outcomes, where only the outcome corresponding to the assigned treatment is observed. The goal of policy learning is to train a policy from the observed data, where a policy is a function that recommends an optimal treatment for each individual, to maximize the policy value. In this study, we first show that maximizing the policy value is equivalent to minimizing the mean squared error for the conditional average treatment effect (CATE) under $\{-1, 1\}$ restricted regression models. Based on this finding, we modify the causal forest, an end-to-end CATE estimation algorithm, for policy learning. We refer to our algorithm as the causal-policy forest. Our algorithm has three advantages. First, it is a simple modification of an existing, widely used CATE estimation method, therefore, it helps bridge the gap between policy learning and CATE estimation in practice. Second, while existing studies typically estimate nuisance parameters for policy learning as a separate task, our algorithm trains the policy in a more end-to-end manner. Third, as in standard decision trees and random forests, we train the models efficiently, avoiding computational intractability.


Assessment of the conditional exchangeability assumption in causal machine learning models: a simulation study

arXiv.org Machine Learning

Observational studies developing causal machine learning (ML) models for the prediction of individualized treatment effects (ITEs) seldom conduct empirical evaluations to assess the conditional exchangeability assumption. We aimed to evaluate the performance of these models under conditional exchangeability violations and the utility of negative control outcomes (NCOs) as a diagnostic. We conducted a simulation study to examine confounding bias in ITE estimates generated by causal forest and X-learner models under varying conditions, including the presence or absence of true heterogeneity. We simulated data to reflect real-world scenarios with differing levels of confounding, sample size, and NCO confounding structures. We then estimated and compared subgroup-level treatment effects on the primary outcome and NCOs across settings with and without unmeasured confounding. When conditional exchangeability was violated, causal forest and X-learner models failed to recover true treatment effect heterogeneity and, in some cases, falsely indicated heterogeneity when there was none. NCOs successfully identified subgroups affected by unmeasured confounding. Even when NCOs did not perfectly satisfy its ideal assumptions, it remained informative, flagging potential bias in subgroup level estimates, though not always pinpointing the subgroup with the largest confounding. Violations of conditional exchangeability substantially limit the validity of ITE estimates from causal ML models in routinely collected observational data. NCOs serve a useful empirical diagnostic tool for detecting subgroup-specific unmeasured confounding and should be incorporated into causal ML workflows to support the credibility of individualized inference.




Honesty in Causal Forests: When It Helps and When It Hurts

arXiv.org Machine Learning

Causal forests have become a popular tool for estimating how treatment effects vary across individuals (Wager and Athey, 2018). They are used in a growing number of domains--including marketing, operations, economics, and public policy--to personalize interventions and inform targeting strategies. Since 2019, dozens of papers in INFORMS journals alone have applied causal forests to experimental or observational data (see Appendix C), often with the goal of estimating individual-level treatment effects. The method builds on a familiar idea: instead of estimating a single average effect for the whole population, we split the population into subgroups based on observed features and estimate effects within each group. This is conceptually similar to how random forests estimate outcomes, except now the goal is to estimate causal effects. But there is a crucial modeling difference: unlike random forests, which typically use the full training data for both splitting and estimation, causal forests often divide the training data in two--using one part to decide how to form the subgroups, and the other to estimate effects within them. This practice, known as honest estimation, is meant to prevent overfitting and selection bias (Athey and Imbens, 2016). It is the default in widely used software packages such as grf (Athey et al., 2019) and EconML (Battocchi et al., 2019), and is commonly recommended in applied research. But is this default always a good idea? 1


Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling

arXiv.org Machine Learning

In recent years, two parallel research trends have emerged in machine learning, yet their intersections remain largely unexplored. On one hand, there has been a significant increase in literature focused on Individual Treatment Effect (ITE) modeling, particularly targeting the Conditional Average Treatment Effect (CATE) using meta-learner techniques. These approaches often aim to identify causal effects from observational data. On the other hand, the field of Explainable Machine Learning (XML) has gained traction, with various approaches developed to explain complex models and make their predictions more interpretable. A prominent technique in this area is Shapley Additive Explanations (SHAP), which has become mainstream in data science for analyzing supervised learning models. However, there has been limited exploration of SHAP application in identifying predictive biomarkers through CATE models, a crucial aspect in pharmaceutical precision medicine. We address inherent challenges associated with the SHAP concept in multi-stage CATE strategies and introduce a surrogate estimation approach that is agnostic to the choice of CATE strategy, effectively reducing computational burdens in high-dimensional data. Using this approach, we conduct simulation benchmarking to evaluate the ability to accurately identify biomarkers using SHAP values derived from various CATE meta-learners and Causal Forest.